Scheduling of Hard Real-Time Multi-Thread Periodic Tasks 



Irina Lupu Joel Goossens 

PARTS Research Center 

Universite libre de Bruxelles (U.L.B.) 
CP 212, 50 av. F.D. Roosevelt 
1050 Brussels, Belgium 

{joel.goossens,irinlupu}@ulb. ac.be 



Abstract 

In this paper we study the scheduling of parallel and real-time recurrent tasks. Firstly, we propose a new parallel 
task model which allows recurrent tasks to be composed of several threads, each thread requires a single processor 
for execution and can be scheduled simultaneously. Secondly, we define several kinds of real-time schedulers that 
can be applied to our parallel task model. We distinguish between two scheduling classes: hierarchical schedulers 
and global thread schedulers. We present and prove correct an exact schedulability test for each class. Lastly, we also 
evaluate the performance of our scheduling paradigm in comparison with Gang scheduling by means of simulations. 



1. Introduction 

In this research, we consider the preemptive scheduling of hard real-time tasks on identical multiprocessor 
platforms (see ||3][2||). We deal with parallel real-time tasks, the case where each task instance (process in the 
following) may be executed on different processors simultaneously. More specifically, each process is composed 
of several independent threads, each thread requires one processor to be executed, consequently the process 
can progress upon several processors simultaneously. In this research we study the schedulability problem of 
recurring multi-thread tasks. 

Nowadays, the design of parallel/multi-thread programs is common thanks to parallel programming paradigms 
like Message Passing Interface (MPI \ ^ ,10i ) Even better, sequential programs can be parallelized using tools 
like OpenMP (see Q for details). 

Related Work. The literature concerning the scheduling of hard real-time and parallel recurring tasks is rel- 
atively poor We can only report few models of parallel tasks and few results (schedulers and schedulabil- 
ity/feasibility tests). Manimaran et al. in I116II consider the non-preemptive EDF scheduling of periodic tasks. 
Han et al. in [12 1 considered the scheduling of a (finite) set of real-time jobs allowing job parallelism while we 
consider the scheduling of either an infinite set of jobs (actually processes in our terminology) or equivalently 
a set of periodic tasks. In a seminal work we contributed to the feasibility problem of parallel tasks. In [Sll 
we provided a task model which integrates job parallelism. We proved that the time-complexity of the feasibil- 
ity problem of these systems is linear relatively to the number of (sporadic) tasks. In [I14II Lakshmanan et al. 
consider the fork-join task model where each task is an alternate sequence of sequential code and m parallel 
segments. They provided a partitioning algorithm and a competitive analysis for EDF and the fork-join task 
model. Regarding the schedulability of recurrent real-time tasks, and to the best of our knowledge, we can only 
report results about the Gang scheduling, where the execution requirement of processes corresponds as a C x w 
rectangle, with the interpretation that a process requires exactly v processors simultaneously for a duration of 
C time units. Kato et al. (see [11311 for details) considered the EDF Gang scheduling and provided a sufficient 
schedulability condition. We studied Fixed Task Priority (FTP) Gang scheduling [8] and we provided an exact 
schedulability test for periodic tasks. 



This Research. In this paper we introduce a more realistic parallel task model, i.e., we study the scheduling of 
periodic and (parallel) multi-thread tasks. Our main contribution is exact schedulability tests for Fixed Subpro- 
gram Priority (FSP) and (FTP, FSP) schedulers. Additionally, we show that Gang scheduling and multi-thread 
scheduling are incomparable and we present an empirical study which shows that, in most of the cases, multi- 
thread scheduling dominates Gang scheduling. Last but not least, in the further work section, we consider a 
realistic extension of our task model which allows tasks to be a sequence of parallel phases (each one having its 
own number of threads) and we present a first negative result. 

Paper Organization. This paper is organized as follows. Section [2] provides our formal task model and im- 
portant definitions. In Section [s] we present a taxonomy of schedulers dedicated to our parallel task model. We 
present exact schedulability tests in Section|4]for two classes of schedulers. In Section[5]we present an empirical 
study which shows that in most of the cases thread scheduling dominates Gang scheduling. Lastly, in Section |6] 
we conclude and consider a realistic extension of our parallel task model and present a first negative result. 

2. Formal model and definitions 

In this work we consider periodic multi-thread tasks, each task Tj is characterized by the tuple 

{Oi,{ql,qf,---,q'^'},D^,T,), 

where 

• Oi is the arrival instant, i.e., the moment of the first activation of the task since the system initialization; 

• {ql,qf, . . . , g"'} is the set of the Vi subprograms of r^; at run- time these subprograms generate threads 
which can be executed simultaneously, i.e., we allow task parallelism; 

• each subprogram q-^ (1 < j < vt) is characterized by an individual worst-case execution time ; 

• Ti is the period, i.e., the exact inter-arrival time between two successive activations of the task; 

• Di is the relative deadline, i.e., the time by which the current instance of the task has to finish execution 
relatively to its arrival. 

Throughout this paper, all timing characteristics in our model are assumed to be non-negative integers, i.e., 
they are multiples of some irreducible time interval (for example the CPU clock cycle is the smallest indivisible 
CPU time unit). 

Task/process & subprogram/thread. In this work we will distinguish between off-line and run-time entities. 
We consider that a task is defined off-line while its instance exists only at run- time under the denomination pro- 
cess. In the same vein we consider that a subprogram is defined off-line while its instance exists only at run-time 
under the denomination thread. Consequently the scheduler manages processes and/or threads. Meanwhile the 
process or thread priority can (or cannot) be based on the static task and subprogram characteristics. 

Each task system t is composed of n periodic and parallel such tasks: r = {ti, . . . , t„}. The deadline of each 
task is less than or equal to the period: Vi, 1?^ < Ti (constrained deadline model). As each task has its own 
offset Oi, the task systems considered are asynchronous. 

Since tasks are periodic, their subprograms have a periodic behavior as well. The j*'' thread of the k^^ process 
of Ti is characterized by the following parameters: an arrival time a{. = Oi + (fc — 1) • Ti, an execution demand 
e|, — Cf and an absolute deadline d{. = a{. + Di. 

The fc'^ process of is characterized by an arrival time Ofc — Oi + {k — l)- Ti, an execution demand expressed 
by the parallel execution demands of its threads ~ {el, ... , e^^) and an absolute deadline dk — Uk + Di. 

A task is said to be active if it has a process with unfinished execution demand. 

A task is characterized by the measure called utilization: Ui — — — -. This measure represents the portion 

of the platform capacity {ui < m) requested by the task when executing. We also denote by U =^ J2i=i ^« 
the total system utilization. In the following, P denotes the least common multiple of all the periods in the task 
system t: P ^ lcm{ri, . . . , r„}. 

The considered multiprocessor platform contains m unit-capacity processors. 



2 



Hi 


1 


2 


n2 


2 





12 12 

Gang Scheduling Multi-thread Scheduling 

Figure 1. Gang scheduling vs. thread scheduling 



Gang scheduling versus Thread Scheduling. Figure [T] illustrates a Gang and a thread scheduling for the 
"same" task set: ti = (0, {1}, Ti, Di) > t2 — (0, {1, 1}, T2, D2). Focusing on T2, Gang scheduling has to manage 
the rectangle C x v = 1 x 2 while thread scheduling has to manage two 1-unit length threads. 

From our point of view thread scheduling paradigm resolves the following Gang scheduling drawbacks: 

1. As exhibited in Il8]| Gang scheduling suffers from priority inversion, i.e., a lower priority task can progress 
while an higher priority active task cannot. 

2. The number of processors required by a task must be not larger than the platform size. 

3. Because of the requirement that the task must execute on exactly upon v processors simultaneously very 
often many processors may be left idle while there is active tasks. 

4. As shown in [8] Gang FTP schedulers are not predictable. Multi-thread FTP schedulers are proven pre- 
dictable in Section |4l 

In this proposal, the exact schedulability tests are based on feasibility intervals with the following definition. 

Definition 1 (Feasibility interval). For any task system r = {ri, . . . , r„} and any m-unit capacity multiprocessor 
platform, the feasibility interval is a finite interval such that if no deadline is missed while considering only the 
processes in this interval no deadline will ever be missed. 

In this paper we consider that the scheduling is priority-driven: the threads are assigned distinct priority 
levels. According to these priority levels the scheduler decides at each time t what will be executed on the mul- 
tiprocessor platform at that time instant: the m highest (if any) priority threads will be executed simultaneously 
on the given platform. The thread-processor assignment is univocally determined by the following rule: "higher 
the priority, lower the processor index". If less than m threads are active the processors with the higher indexes 
are left idle. 

We consider preemptive scheduling: a higher priority thread can interrupt the executing lower priority thread. 
3. Taxonomy of schedulers 

In this work we consider two classes of real-time schedulers for our parallel task model: hierarchical schedulers 
and global thread schedulers. 

• At top-level hierarchical schedulers manage processes with a process-level scheduling rule and use a second 
(low-level) scheduling rule two manage threads within each process. 

• Global thread schedulers assign priorities to threads regardless of the tasks/ subprograms that generated 
them. 

In order to define rigorously our Hierarchical and Global schedulers we have to introduce the following 
schedulers. 

Definition 2 (Fixed Task Priority (FTP)). A fixed task priority scheduler assigns a fixed and distinct priority to 
each task before the execution of the system. At run-time each process priority corresponds to its task priority. 

Among the FTP schedulers we can mention Rate-Monotonic (RM) IfTslI and Deadline-Monotonic (DM) [1|. 

Definition 3 (Fixed Process Priority (FPP)). A fixed process priority scheduler assigns a fixed and distinct 
priority to processes upon arrival. Each process preserves the priority level during its entire execution. 

The Earliest Deadline First (EDF) [11511 scheduler is an example of FPP scheduler 
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Definition 4 (Dynamic Process Priority (DPP)). A dynamic process priority scheduler assigns, at each time t, 
priorities to the active processes according to their run-time characteristics. Consequently, during its execution, 
a process may have different priority levels. 

The Least Laxity First (LLF) scheduler is a DPP scheduler since the laxity is a dynamic process characteristic 
(see Q for details). 

In the same vein, the following schedulers can be defined at thread level: 

Definition 5 (Fixed Subprogram Priority (FSP)). A fixed subprogram priority scheduler assigns a fixed and 
distinct priority to each subprogram before the execution of the system. At run-time each thread priority corre- 
sponds to its subprogram priority. 

An example of FSP scheduler is the Longest Subprogram First scheduler. 

Definition 6 (Fixed Thread Priority (FThP)). A fixed thread priority scheduler assigns a fixed and distinct 
priority to threads upon arrival. Each thread preserves the priority level during its entire execution. 

As far as we know, no FThP scheduler can be defined based only on the characteristics of the tasks in our 
model, at least schedulers different than the class FSP. 

Definition 7 (Dynamic Thread Priority (DThP)). A dynamic thread priority scheduler assigns, at time t, priori- 
ties to the existing threads according to their characteristics. During its execution, a thread may have different 
priority levels. 

An example of DThP is LLF applied at thread level. 

3.1. Hierarchical schedulers 

Hierarchical schedulers are built following the next two steps: 

1. at process level, one of the following schedulers is chosen in order to assign priorities to process: FTP, FPP 



2. for assigning priorities within process, one of the following schedulers will be chosen: FSP, FThP, DThP. 

In the following an hierarchical scheduler will be denoted by the couple (a, /3), where a e {FTP, FPP, DPP} 
and /3 e {FSP, FThP, DThP}. 

3.2 Global thread schedulers 

As global thread schedulers, the FSP, FThP and DThP schedulers can be applied to a set of subprograms or 
threads regardless of the task that they belong to. 

Notice that some global thread schedulers are identical to some hierarchical ones. E.g., a total order between 
threads (i.e., a FThP scheduler) can "mimic" any hierarchical (FTP, FThP) scheduler 

4. Exact schedulability tests 

In this section, we will present two exact schedulability tests: one for FSP and one for (FTP, FSP) schedulers. 

4.1. Exact FSP schedulability test 

The first step into defining the schedulability test for FSP schedulers is to prove that their schedules are 
periodic. The proof is based on the periodicity of FTP schedules when the FTP is applied to systems t' with 
the following task model: a task e t' is characterized by {Ok,Ck, Dk,Tk) [6|. This model will be called in 
this paper the sequential task model. The periodicity of FTP schedules for the sequential task model is stated in 
Theorem [s] The tasks in r' (1 < /c < n) are ordered by decreasing priority: t{ > • • • > r/^. 

Theorem 8 dUl). For any preemptive FTP scheduling algorithm A, if an asynchronous constrained deadline 
system t' ~ {t[, . . . , r/^} is A-feasible, then the A-schedule of r' on m-unit capacity multiprocessor platform is 
periodic with a period of P starting from instant S„ where S", is defined as: 



and DPP. 




(1) 



(Assuming that the execution times of each task is constant.) 
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In the following, we consider a task system r with r J2i=i subprograms; each subprogram of the 
task Ti is characterized by {Oi,Cl,Ti,Di). A FSP scheduler is used to assign priorities to the r subprograms. 
As these priorities are assigned regardless of the tasks to which the subprograms belong to, we can consider a 
simpler notation: each subprogram is characterized by {Oi, Ci, Tg, Di) (1 < £ < r); if corresponds to the 
j"^ subprogram of t,;, then Og = Ot, Cg = C/ , Dg — Di, Tg ~ Ti. In the following we assume without loss of 
generality that subprograms are ordered by FSP decreasing priority: q^ > ■ ■ ■ > q^. 

Theorem 9. For any preemptive FSP scheduling algorithm A, if an as5Tichronous constrained deadline system 
r containing r subprograms (regardless of the tasks they belong to) is A-feasible, then the A-schedule of r on 
m unit-capacity multiprocessor platform is periodic with a period of P starting from instant S*, where S* is 
defined as follows: 

S* = max{0„ + [^f^l • T,}, (2) 
Vje {2,3,...,r}. 

(Assuming that the execution times of each subprogram is constant.) 

Proof. When using FSP, a parallel task system r with n tasks and r subprograms can be seen a sequential task 
system r' which contains r sequential periodic tasks r' = {t{, . . . , r^} such that the task r^' (1 < ( < r) has the 
same characteristics as the subprogram of r (r^' : {Og, Cg,Tg, Dg)). From the FSP priority assignment on r, 
a FTP priority assignment for t' can be defined: \i q^ > ■ ■ ■ > q^ according to FSP, the corresponding sequential 
tasks have the following order t{ > • • • > according to FTP. 

By Theorem [sj we know that the schedule of FTP on r' is periodic with a period of P starting with 5*^. We 
can observe that 5*^ has the same value as S* . This means that the FSP schedule on r is periodic with a period 
of P starting with S* . □ 

Theorem [9] considers that execution times Cg of a subprogram q^ iX < i <r) are constant. In order to define 
the schedulability test for the FSP schedulers, we have to prove that they are predictable. 

Definition 10 (Predictability). Lets consider the sets of threads J and ,/' which differ only with regards to their 
the execution times: the threads in J have executions times less than or equal to the execution times of the 
corresponding threads in J'. A scheduling algorithm A is predictable if, when applied independently on J and 
J', a thread in J finishes execution before or at the same time as the corresponding thread in J'. 

Theorem 11. FSP schedulers are predictable. 

Proof. We mentioned in the proof of the Theorem [9] that the task system t containing r subprograms can be 
seen as a system t' of r sequential tasks such that a task r^' inherits the characteristics of the corresponding 
subprogram (7^ of t (1 < £ < r). A FTP priority assignment for r' can be built following the priorities assigned 
by FSP to the corresponding subprograms in t: q^ > ■ ■ ■ > q^ gives t{ > • • • > r^. 

In [11 1 it is proven that for systems like t', FTP schedulers are predictable on m unit-capacity multiprocessor 
platforms. Since t is equivalent to r' and the FTP scheduler assigns the same priorities to sequential tasks as 
FSP to the corresponding subprograms, FSP schedulers are also predictable. □ 

Based on Theorem [9] and [TTj we can define an exact feasibility test for FSP schedulers. 

Schedulability Test 1. For any preemptive FSP scheduler A and for any A-feasible asynchronous constrained 
deadline system t containing r subprograms (regardless of the tasks they belong to) on a m unit-capacity 
multiprocessor platform, [0, S* -f P) is a feasibility interval, where S* is defined by equation[2] 



Proof. This is a direct consequence of Theorems [9] and 11 □ 



4.2. Exact (FTP, FSP) schedulability test 

The first step in the definition of the exact schedulability test for the (FTP, FSP) schedulers is to prove the 
periodicity of the feasible schedules. 

Theorem 12. For any preemptive (FTP, FSP) scheduling algorithm A, if an asynchronous constrained deadline 
parallel task system t — {ti, . . . , t„} is A-feasible, then the A-schedule of r on m unit-capacity multiprocessor 
platform is periodic with a period of P starting from instant S'„, where 5, is defined by equation [T] and tasks are 
ordered by decreasing priority: n > T2 > ■ ■ ■ > t„. 
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Proof. Lets consider that the tasks in r and their subprograms are ordered by decreasing priority: 

Ti > T2 > • • • > r„ with ql > ■■■ > , VI < i < rt. 

Following these priority orders, we can define a FSP scheduler B which assigns the following priorities to the 
r = X]"=i subprograms of r: 

ql> ql> ■■■ > ql' > ql> qi> ■■■ > q^' > ■■■ 
■■■ > qn-l> ■■■ > C-l ><ln>---> C ■ (3) 

The FSP schedulers assign priorities to subprograms regardless of the tasks they belong to. So we can rewrite 
equation [3] regardless of the tasks ri, . . . , t„: 

q^ > ■■■ > q"' > ■■■ > q^+^^=i > ■ ■ ■ > q' . 

By Theorem [9} the schedule generated by B is periodic with a period of P from S* . We can observe that 
the S* quantity defined by equation |2] represents the instant of the first arrival of q^ at or after time instant 
S*_i. Since all the subprograms belonging to a task (1 < i < n) have the same activation times and the same 
periods and B assigns consecutive priorities to the subprograms of the same task (as seen in equation [s]): 

■i— 1 i 

S* = S*_„ Vj : (1 + ^ Vk) <3<Y.^k (4) 

fe=l k=l 

and 1 < z < n. 

Furthermore, we can observe that SI — Si — Oi. From this fact and equation [4j we can conclude: 

SI = 5i ^ 



The B-schedule is then periodic with a period of P starting from 5„. Since the i3-schedule is the same as the 
one generated by A, the A-schedule is also periodic with a period of P starting from Sn- □ 

We will now prove that the (FTP, FSP) schedulers are also predictable. 

Theorem 13. (FTP, FSP) schedulers are predictable . 

Proof. Since based on any (FTP, FSP) scheduler we can define a FSP scheduler as shown in the proof of the 



Theorem 12 and since, by Theorem 11 FSP schedulers are predictable, (FTP, FSP) schedulers are predictable 



as well. □ 

We will now define the exact schedulability test for (FTP, FSP) schedulers. 

Schedulability Test 2. For any preemptive (FTP, FSP) scheduler A and for any A-feasible asynchronous con- 
strained deadline parallel task system t = {ti, . . . , t„} on a to unit-capacity multiprocessor platform, [0, Sn + P) 
is a feasibility interval, where 5*^ is defined by equation [T] 



Proof. This is a direct consequence of Theorems 12 and|13[ □ 



4.3. Gang and thread scheduling are incomparable 

In this section we will show that Gang FTP and thread hierarchical (FTP, FSP) schedulers are incomparable 
— in the sense that there are task systems which are schedulable using Gang scheduling approaches and not by 
thread scheduling approaches, and conversely. 

The considered FTP scheduler is DM [TI : the priorities assigned to tasks by DM are inversely proportional to 
the relative deadlines. The FSP scheduler is called Index Monotonic (IM) and it assigns priorities as follows: the 
lower the index of the subprogram within the task, the higher the priority. 

In the following examples, the task offsets of the considered systems are equal to and the feasible schedules 



are periodic from with a period of P (Theorem 12 and [|8J ) 
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Figure 2. Gang DM unfeasible, (DIVI, IIU!) feasible 



Example 1. This first example presents a task system that is unschedulable by Gang DM, but schedulable 
by (DM, IM) on a 2-unit capacity multiprocessor platform. The tasks in the system t = {ti,t2,t3} have the 
following characteristics: n : (0, {2}, 3, 3), t2 : (0,{3},4,4) and : (0, {2, 2}, 12, 12). According to DM ti > 

T2 > T-i. 

We can observe in figure [2] that according to Gang DM, task T3 has to wait for 2 available processors simul- 
taneously in order to execute. This is the case at time instant 11; though, at time 12 the task has unfinished 
execution demand and it misses its deadline. 

In the case of (DM, IM), ri and T2 execute at the same moments and on the same processors as in Gang DM. 
The difference is that T3 can start executing its first process at time instant 2 since one processor is available. 
Taking advantage of the fact that the processors are left idle by ti and T2 at some moments in time, the first 
process of T3 (which is the only process in the interval [0, 12)) finishes execution at time instant 8. No deadline 
is missed, therefore, the system is schedulable by (DM, IM). 

Example 2. The second example presents a task system r = {n, T2, ra} which is schedulable with Gang DM, 
but unschedulable with (DM, IM) on a 3-unit capacity multiprocessor platform. The tasks of the system have 
the following characteristics: n : (0, {3, 3}, 4, 4), T2 : (0, {1, 1}, 5, 5) and T3 : (0, {9}, 10, 10). According to DM 

Tl> T2> T3. 

In figure [3] we can observe that according to Gang DM, at instant 0, ri is assigned to 2 of the 3 processors in 
the platform. Since there is only one processor left, T2 cannot execute, therefore T3 starts its execution on the 
third processor At time instant 3, 2 processors are available and, consequently, T2 may start executing, etc. No 
deadline is missed in the time interval [0, 12), therefore the system is Gang DM feasible. 

According to (DM, IM), even if ri occupies 2 processors of the 3 in the platform, T2 may start executing on 
the third a first thread from time instant to time instant 1. The second thread of its first process will execute 
on the third processor from time instant 1 to time instant 2. We can conclude that T3 will miss its deadline at 
time instant 10 since it has 9 units of execution demand and only 6 time units available until its deadline. 

Deadline miss 
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Figure 3. Gang DIVI feasible, (DM, IM) unfeasible 



5. Empirical Study 

The purpose of this empirical study is to evaluate the performance of multi-thread schedulers compared with 
the one of Gang schedulers. More specifically, the chosen multi-thread scheduler is the (FTP, FSP) scheduler 
(DM, IM). Among the Gang schedulers, we consider Gang DM. 

Since Gang FTP schedulers are not predictable, in this study we consider constant execution times. From [[8ll 
and the Schedulability test[2| we know that we have to simulate both Gang FTP and (FTP, FSP) schedulers in 



7 



the time interval [0, Sn + P) in order to conclude if the task system is feasible with one of them or with both. 
In this empirical study, the execution times of all the subprograms of a task are considered equal. 

5.1. Evaluation criteria 

Gang DM and (DM,IM) are evaluated according to the following criteria: 

• success ratio; 

• worst-case response time of the lowest priority task in the system. 

Definition 14 (Response time). The response time of a process represents the duration between the arrival of 

the process and the moment it finishes execution. 

Definition 15 (Worst-Case Response Time (WCRT)). The worst-case response time of a task is the maximum 
response time of its processes. 

Success ratio. The success ratio represents the portion of successfully scheduled task systems and it is defined 
as follows: 

number of successfully scheduled task systems 
total number of considered task systems 

Worst-case response time. If system r is schedulable, the worst-case response times of a task Tj e r for Gang 

DM and (DM, IM) are calculated according to its processes within the time interval [0, S'„ + P). 

For each feasible task system with both Gang DM and (DM, IM) we compare the WCRTs (computed for 
each of the two schedulers) of the lowest priority task. For a given system utilization, we count separately 
the tasks systems where the (DM, IM) WCRT is strictly inferior to the one computed under Gang DM and 
inversely. Consequently, the uncounted task systems are those where the computed WCRTs are equal for the 
two schedulers. 

5.2. Task system generation methodology 

The procedure for task system generation is the following: individual tasks are generated and added to the 
system until the total system utilization exceeds the platform capacity (to). 
The characteristics of a task n are integers and they are generated as follows: 

1. the period Tj is uniformly chosen from [1, 250]; 

2. the offset is uniformly chosen from [1, Tj]; 

3. the utilization Ui of the task is inferior to m and it is generated using the following distributions: 

• uniform distribution between [^,m]; 

• bimodal distribution: light tasks have an uniform distribution between heavy tasks have an 
uniform distribution between , to]; the probability of a task being heavy is of |; 

• exponential distribution of mean ™; 

• exponential distribution of mean ^; 

• exponential distribution of mean . 

4. Vi is uniformly chosen from [1, m]; 

5. since we consider that all the subprograms of a task have equal execution times, it is sufficient to 
compute a single execution time value: Cj = ^^^; 

6. the deadline is uniformly chosen between [d, Ti]. 

We use several distributions (with different means) in order to generate a wide variety of task systems and, 
consequently, to have more accurate simulation results. 

The generated systems have a least common multiple of the task periods bounded by 5,000,000. During 
simulations, we considered multiprocessor platforms containing 2, 4, 8 and 16 processors. A total of 450,000 
task systems were generated. 
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5.3. Results 

Firstly, we will analyze the success ratio of the two schedulers. 

Success ratio. Each of the following figures contains 3 plots: one represents the success ratio of the (DM, IM) 
multi-thread scheduler, a second one the success ratio of Gang DM and a third one expresses the ratio of task 
systems successfully scheduled by both of them. 

The performance gap between the two schedulers in terms of success ratio is growing as the number of pro- 
cessors grows, as shown in figures[4}|7j In the case of the 2 processors platform, they have similar performances. 
On 4 processors, the (DM, IM) successfully schedules at most 10% more task systems than Gang DM (this is 
attained at utilization 2.8). On 8 processors it schedules 12% more task systems (at utilization 5.2) and on 16 
processors it can schedule 14% more task systems than Gang DM (at instant 10.4). 

We can also observe in each of the figures [4}|7] that (DM, IM) and Gang DM are incomparable since the 
corresponding plots are above the "both" plot in each of these figures. Moreover the amount of additional 
systems that thread scheduling can manage is quite better In figure [4] we can see that on a 2 processors 
platform, the amount of additional systems that (DM, IM) can schedule is at most 2 times higher (at utilization 
1.6) than the one of additional systems scheduled by Gang DM. In the case of a 4 processors platform, (DM, IM) 
can perform 4.3 times better than Gang DM (at utilization 2.8), on 8 processors 5.4 times (at utilization 5.2) 
and, finally, on 16 processors it can perform 7.5 times better than Gang DM (at utilization 10.4). 
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Figure 4. Success Ratio: 2 processors 
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Figure 5. Success Ratio: 4 processors 



Response time. In the following we will reference the figures[8f|10| The utilization of the considered systems 
is superior to 25% and inferior to 90% of the platform capacity. 
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Figure 7. Success Ratio: 16 processors 



In each figure, there are two plots: one that marks the portion of task systems (among those which are 
schedulable by both Gang DM and (DM, IM)) where the (DM, IM) WCRT of the lowest priority task is strictly 
inferior to the one computed under Gang DM; a second plot marks the contrary behavior 

The results of our simulations showed that, for systems executed on 2 processors, the WCRTs of the lowest 
priority task under the two schedulers are equal. 

On 4, 8 and 16-unit capacity multiprocessor platforms, (DM, IM) outperforms Gang DM as we can see in 
figures |8||Tq| For all of the three cases, the (DM, IM) WCRT is inferior to the Gang DM WCRT in at most 



50% of the considered task systems. The lowest performance gap is observed on the 4 processors platform (at 
utilization 1.4) and, even in that case, (DM, IM) performs 8% better than Gang DM. 



6. Conclusions and future work 



In this work we considered the multi-thread scheduling for parallel real-time systems. The main advantage 
of this model is that it does not require all threads of a same task to execute simultaneously as Gang scheduling 
does. 

We defined in this paper the several t5^es of priority-driven schedulers dedicated to our parallel task model 
and scheduling method. We distinguished between hierarchical schedulers (that firstly assign distinct priorities 
at system level and secondly, within each task) and global thread schedulers (that do not take into account the 
original tasks when priorities are assigned at thread level). 

We showed that, contrary to Gang FTP, the hierarchical and global thread schedulers based on FTP and FSP 
are predictable. Based on this property and the periodicity of their schedules, we defined two exact schedula- 
bility tests. 

Even though the Gang and multi-thread schedulers are, as we have shown, incomparable, the empirical 
study confirmed the intuition that our approach outperforms Gang scheduling. In terms of success ratio, the 
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Figure 8. WCRT comparison: 4 processors 
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Figure 9. WCRT comparison: 8 processors 
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Figure 10. WCRT comparison: 16 processors 



performance gap increases as the number of processors grows. 

Future work. While we have shown the benefits of our thread scheduling model in comparison with Gang 
scheduling, in further work we would like to extend our model to have more realistic parallel task models. In 
particular we aim to consider multi-phase tasks, in the sense that a task can have an arbitrary number of phases 
that have to be executed sequentially. Each phase would be characterized by its own degree of parallelism. E.g., 
the task could be composed of initialization, computing and finalization phases; solely the computing phase can 
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be parallelized, others are sequential. Unfortunately, we have to report a first negative result, i.e., multi-phase 
multi-thread hierarchical schedulers are not predictable. 

We extend the model as follows, each task t, is characterized by: 

(O„{0i,02,...,0f},A,T;) 

with the interpretation that the task is composed by a sequence of li phases, each phase 4>l is characterized by 
a set of subprograms q}j,qfj, ■ ■ ■ , q^y (wi j is the number of subprograms in the phase j of task i). Each thread 
is defined as our original model. 

Consider for instance ri = (0,0} = {2},(/'i = {2,2,2}) > t2 = (1,(^2 = {!}) omit here deadline and 
period characteristics). Figure [TT] shows that if we consider the worst-case duration of each thread of ti the 
single thread of t2 complete at time 2 while if we reduce the duration of the first phase of task ti by one 
time unit then t2 completes at time 4. The example shows the non predictability of multi-phase multi-thread 
hierarchical schedulers. 



-1 




11,2 


P?,2 


TT? 


hi, 


1f,2 





1 2 3 4 5 6 

Worst case execution times 



1 % 


Pi,! 


1 




<2 





1 2 3 4 5 6 

An execution time lower than the worst case 

Figure 11. Multiphase : unpredictability 



References 

[1] N. Audsley, A. Burns, M. F. Richardson, and A. J. Wellings. Hard real-time scheduling: The deadhne-monotonic 
approach. In IEEE Workshop on Real-Time Operating Systems and Software, pages 133-137, 1991. 

[2] T. P. Baker. An analysis of EDF scheduhng on a multiprocessor IEEE Trans, on Parallel and Distributed Systems, 
15(8):760-768, 2005. 

[3] T. P. Baker and S. Baruah. Schedulability analysis of multiprocessor sporadic task systems. Handbook of Real-Time and 
Embedded Systems, 2006. 

[4] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel programming in OpenMP. Morgan 

Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. 
[5] S. Collette, L. Cucu, and J. Goossens. Integrating job parallelism in real-time scheduling theory. Information Processing 

Letters, 106(5):180-187, May 2008. 
[6] L. Cucu-Grosjean and J. Goossens. Exact schedulability tests for real-time scheduling of periodic tasks on unrelated 

multiprocessor platforms. Journal of Systems Architecture, 57(5):561-569, May 2011. 
[7] M. L. Dertouzos and A. K. Mok. Multiprocessor online scheduling of hard-real-time tasks. IEEE Transactions on 

Software Engineering, 15(12):1497-1506, 1989. 
[8] J. Goossens and V. Berten. Gang FTP scheduling of periodic and parallel rigid real-time tasks. In Real-Time and 

Network Systems, pages 189-196, November 2010. 
[9] S. Gorlatch and H. Bischof A generic MPI implementation for a data-parallel skeleton: Formal derivation and appli- 
cation to FFT. Parallel Processing Letters, 8(4) :44 7-458, 1998. 
[10] W. Gropp, editor. Using MPI: portable parallel programming with the message-passing interface. Cambridge, MIT Press, 

2nd edition, 1999. 

[11] R. Ha and J. W.-S. Liu. Validating timing constraints in multiprocessor and distributed real-time systems. In 14th 

International Conference on Distributed Computing Systems, pages 162-171, 1994. 
[12] S. Han and M. Park. Predictability of least laxity first scheduling algorithm on multiprocessor real-time systems. In 

Springer, editor. Emerging Directions in Embedded and Ubiquitous Computing, volume 4097, pages 755-764, 2006. 
[13] S. Kato and Y. Ishikawa. Gang EDF scheduling of parallel task systems. In 30th IEEE Real-Time Systems Symposium, 

pages 459-468, 2009. 

[14] K. Lakshmanan, S. Kato, and R. Rajkumar. Scheduling parallel real-time tasks on multi-core processors. In 31st IEEE 

Real-Time Systems Symposium, pages 259-268, 2010. 
[15] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 

20:46-61, January 1973. 

[16] G. Manimaran, C. Siva Ram Murthy, and K. Ramamritham. A new approach for scheduling of parallelizable tasks in 
real-time multiprocessor systems. Real-Time Systems, 15:39-60, 1998. 



12 



